semantic kernel
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Finland (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (0.67)
- Information Technology (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Finland (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (0.67)
- Information Technology (0.46)
MiniCPM4: Ultra-Efficient LLMs on End Devices
MiniCPM Team, null, Xiao, Chaojun, Li, Yuxuan, Han, Xu, Bai, Yuzhuo, Cai, Jie, Chen, Haotian, Chen, Wentong, Cong, Xin, Cui, Ganqu, Ding, Ning, Fan, Shengda, Fang, Yewei, Fu, Zixuan, Guan, Wenyu, Guan, Yitong, Guo, Junshao, Han, Yufeng, He, Bingxiang, Huang, Yuxiang, Ji, Baoxi, Kong, Cunliang, Li, Qiuzuo, Li, Siyuan, Li, Wenhao, Li, Xin, Li, Yanghao, Li, Yishan, Li, Zhen, Liu, Dan, Lin, Biyuan, Lin, Yankai, Long, Xiang, Lu, Quanyu, Lu, Yaxi, Luo, Peiyan, Lyu, Hongya, Ou, Litu, Pan, Yinxu, Pu, Lushi, Qu, Zekai, Shi, Qundong, Song, Zijun, Su, Jiayuan, Su, Zhou, Sun, Ao, Sun, Xianghui, Tang, Peijun, Wang, Fangzheng, Wang, Feng, Wang, Shuo, Wang, Yudong, Wang, Zheng, Wu, Yesai, Xiao, Zhenyu, Xie, Jie, Xie, Zihao, Xu, Xiaoyue, Yan, Yukun, Yuan, Jiarui, Zhang, Jinqian, Zhang, Kaihuo, Zhang, Lei, Zhang, Linyue, Zhang, Xueren, Zhang, Yudi, Zhao, Hengyu, Zhao, Weilin, Zhao, Weilun, Zhao, Yuanqian, Zheng, Zhi, Zhou, Chuyue, Zhou, Ge, Zhou, Jie, Zhou, Wei, Zhou, Yanghao, Zhou, Zihan, Zhou, Zixuan, Liu, Zhiyuan, Zeng, Guoyang, Jia, Chao, Li, Dahai, Sun, Maosong
This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose CPM.cu that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Furthermore, we construct a hybrid reasoning model, MiniCPM4.1, which can be used in both deep reasoning mode and non-reasoning mode. Evaluation results demonstrate that MiniCPM4 and MiniCPM4.1 outperform similar-sized open-source models across benchmarks, with the 8B variants showing significant speed improvements on long sequence understanding and generation.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Education (0.67)
- Information Technology (0.46)
- Energy (0.45)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction
Xing, Zhenchang, Liu, Yang, Cheng, Zhuo, Huang, Qing, Zhao, Dehai, Sun, Daniel, Liu, Chenhua
With the growing capabilities of large language models (LLMs), they are increasingly applied in areas like intelligent customer service, code generation, and knowledge management. Natural language (NL) prompts act as the ``APIs'' for human-LLM interaction. To improve prompt quality, best practices for prompt engineering (PE) have been developed, including writing guidelines and templates. Building on this, we propose Controlled NL for Prompt (CNL-P), which not only incorporates PE best practices but also draws on key principles from software engineering (SE). CNL-P introduces precise grammar structures and strict semantic norms, further eliminating NL's ambiguity, allowing for a declarative but structured and accurate expression of user intent. This helps LLMs better interpret and execute the prompts, leading to more consistent and higher-quality outputs. We also introduce an NL2CNL-P conversion tool based on LLMs, enabling users to write prompts in NL, which are then transformed into CNL-P format, thus lowering the learning curve of CNL-P. In particular, we develop a linting tool that checks CNL-P prompts for syntactic and semantic accuracy, applying static analysis techniques to NL for the first time. Extensive experiments demonstrate that CNL-P enhances the quality of LLM responses through the novel and organic synergy of PE and SE. We believe that CNL-P can bridge the gap between emerging PE and traditional SE, laying the foundation for a new programming paradigm centered around NL.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Arabian Gulf (0.04)
- Asia > China > Jiangxi Province (0.04)
- Health & Medicine (0.67)
- Energy (0.67)
- Education (0.46)
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
Nikitin, Alexander, Kossen, Jannik, Gal, Yarin, Marttinen, Pekka
Uncertainty quantification in Large Language Models (LLMs) is crucial for applications where safety and reliability are important. In particular, uncertainty can be used to improve the trustworthiness of LLMs by detecting factually incorrect model responses, commonly called hallucinations. Critically, one should seek to capture the model's semantic uncertainty, i.e., the uncertainty over the meanings of LLM outputs, rather than uncertainty over lexical or syntactic variations that do not affect answer correctness. To address this problem, we propose Kernel Language Entropy (KLE), a novel method for uncertainty estimation in white- and black-box LLMs. KLE defines positive semidefinite unit trace kernels to encode the semantic similarities of LLM outputs and quantifies uncertainty using the von Neumann entropy. It considers pairwise semantic dependencies between answers (or semantic clusters), providing more fine-grained uncertainty estimates than previous methods based on hard clustering of answers. We theoretically prove that KLE generalizes the previous state-of-the-art method called semantic entropy and empirically demonstrate that it improves uncertainty quantification performance across multiple natural language generation datasets and LLM architectures.
- Europe > France (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Chile (0.04)
- (3 more...)
- Research Report > New Finding (0.92)
- Research Report > Promising Solution (0.68)
- Research Report > Experimental Study (0.68)
The Digital Insider
At first glance, building a large language model (LLM) like GPT-4 into your code might seem simple. The API is a single REST call, taking in text and returning a response based on the input. But in practice things get much more complicated than that. The API is perhaps better thought of as a domain boundary, where you're delivering prompts that define the format the model uses to deliver its output. But that's a critical point: LLMs can be as simple or as complex as you want them to be.
Semantic Kernel: A bridge between large language models and your code
At first glance, building a large language model (LLM) like GPT-4 into your code might seem simple. The API is a single REST call, taking in text and returning a response based on the input. But in practice things get much more complicated than that. The API is perhaps better thought of as a domain boundary, where you're delivering prompts that define the format the model uses to deliver its output. But that's a critical point: LLMs can be as simple or as complex as you want them to be.